PYTHON-5536 Avoid clearing the connection pool when the server connection rate limiter triggers #2509

blink1073 · 2025-08-26T01:51:32Z

Currently testing with this script for async:

import asyncio
from pymongo import AsyncMongoClient

client = AsyncMongoClient(maxConnecting=100)

async def target():
    await client.admin.command("ping")

async def main():
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentRateLimiterEnabled=True)
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentRatePerSec=30)
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentBurstCapacitySecs=1)
    await client.admin.command('setParameter', 1, ingressConnectionEstablishmentMaxQueueDepth=1)

    # Warm the pool so there are existing connections.
    tasks = []
    for i in range(10):
        tasks.append(asyncio.create_task(target()))
    await asyncio.wait(tasks)

    tasks = []
    for i in range(200):
        tasks.append(asyncio.create_task(target()))

    await asyncio.wait(tasks)


asyncio.run(main())

and this one for sync:

from pymongo import MongoClient
from concurrent.futures import ThreadPoolExecutor, wait

client = MongoClient(maxConnecting=100)

def target():
    client.admin.command("ping")

def main():
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentRateLimiterEnabled=True)
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentRatePerSec=30)
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentBurstCapacitySecs=1)
    client.admin.command('setParameter', 1, ingressConnectionEstablishmentMaxQueueDepth=1)

    # Warm the pool so there are existing connections.
    print("1")
    tasks = []
    pool = ThreadPoolExecutor(200)
    for i in range(10):
        tasks.append(pool.submit(target))
    wait(tasks)
    print("2")
    tasks = []
    for i in range(200):
        tasks.append(pool.submit(target))
    wait(tasks)

main()

…mongodb#2498)

…#2499)

…ongodb#2503)

)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…b#2507) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ction rate limiter triggers

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Steven Silvester <[email protected]>

ShaneHarvey · 2025-08-26T16:51:37Z

pymongo/asynchronous/pool.py

+                            conn.conn.get_conn.read(1)
+                        except Exception as _:
+                            # TODO: verify the exception
+                            close_conn = False


2 comments:

I believe this logic needs to move to connection checkout. Here in connection check in we already know the connection is useable because we're checking it back in after a successful command.

Instead of a 1ms read can we reuse the existing _perished() + conn_closed() methods?

ShaneHarvey

Nice work!

…DRIVERS-3218

…er into DRIVERS-3218" This reverts commit c1fe2e3, reversing changes made to f51e8a5.

(cherry picked from commit 0d4c84e)

This reverts commit 532c1b8.

ShaneHarvey · 2025-09-10T18:20:23Z

pymongo/asynchronous/pool.py

+            if not self.is_sdam and type(e) == AutoReconnect:
+                self._backoff += 1
+                e._add_error_label("SystemOverloaded")
+                e._add_error_label("Retryable")


We need to move this logic so that it covers the TCP+TLS handshake which happen up above.

I set a breakpoint in the TCP+TLS handshake error handler and confirmed that handshakes are succeeding. The error only occurs on hello/auth.

Okay I'm actually surprised by this since the design SPM-4319 indicates the rate limiter rejection happens before the TLS handshake.

(cherry picked from commit 98e9f5e)

blink1073 · 2025-09-11T11:45:13Z

Ideally we'd like to detect connection reset by peer errors and only backoff for those, but in my testing, during backoff, we can sometimes get a regular EOF instead of a closing error, which triggers the raise OSError("connection closed") error.

NoahStapp · 2025-09-11T14:47:45Z

pymongo/network_layer.py

        else:
-            if self._closing_exception:
-                raise self._closing_exception
+            if self._closed.done():


Is calling is_closing here better? It'll catch more edge cases in theory.

Hmm let me try that.

No, it is ambiguous as to whether connection_lost as been called yet. Since connection_lost is synchronous, checking for self._closed.done() assures that we have actually lost the connection.

ShaneHarvey · 2025-09-19T18:03:48Z

pymongo/asynchronous/pool.py

+        ):
+            self._backoff += 1
+            error._add_error_label("SystemOverloaded")
+            error._add_error_label("Retryable")


Could you merge backpressure? Originally I added the incorrect labels here. It should be "SystemOverloadedError" and "RetryableError"

ShaneHarvey · 2025-09-19T18:03:51Z

pymongo/asynchronous/pool.py

+            self._backoff += 1
+            error._add_error_label("SystemOverloaded")
+            error._add_error_label("Retryable")
+            print(f"Setting backoff in {phase}:", self._backoff)  # noqa: T201


Instead of inspecting the error message after the fact, is it possible we can record some state to determine if the error happened during DNS+TCP or after? Like:

# Assume all non dns/tcp/timeout errors mean the server rejected the connection due to overload. if not errorDuringDnsTcp and not timeoutError: error._add_error_label("SystemOverloadedError")

DNS is already resolved by the time we make a pool as far as I can tell, and we can't distinguish between TCP connection and TLS handshake for async.

I added some logic in 7548f7b that looks for a specific error attached to AutoReconnect.

For discussion: we can still run into the condition were we hit this line, and there is no closing error, and we have not yet received any data on the protocol. I verified this by setting a flag when buffer_updated is called. We don't have a way to ascribe more semantic meaning to this condition as far as I can tell from the Protocol/Transport docs and the available properties on each.

The problem got even worse when using gevent, since the error was something completely different, so I reverted any extra handling of the connection error.

… into DRIVERS-3218

blink1073 · 2025-09-22T22:23:13Z

Successful PERF run with the current state (f1294dc): https://spruce.mongodb.com/task/sys_perf_perf_3_node_replSet.availability.arm.aws.2024_05_connection_storm_rate_limited_locust_patch_d67246eb210c68496ec38c597f9a1e2cb76af448_68d1c72d8f08a5000737aaf6_25_09_22_22_01_34/logs?execution=0

blink1073 and others added 15 commits August 19, 2025 11:23

PYTHON-5503 Use uv to install just in GitHub Actions (mongodb#2490)

d24b4a5

PYTHON-5502 Fix c extensions on OIDC VMs (mongodb#2489)

3a26119

Prep for 4.14.1 release (mongodb#2495) [master] (mongodb#2496)

db3d3c7

PYTHON-5143 Support auto encryption in unified tests (mongodb#2488)

f7b94be

PYTHON-5496 Update CSOT tests for change in dropIndex behavior in 8.3 (…

9a9a65c

…mongodb#2498)

PYTHON-5508 - Add built-in DecimalEncoder and DecimalDecoder (mongodb…

5e96353

…#2499)

PYTHON-5456 Support text indexes with auto encryption (mongodb#2500)

e08284b

PYTHON-5510 Fix server selection log message for commitTransaction (m…

ddf9508

…ongodb#2503)

PYTHON-5514 Specific assertions for "is" and "is not None" (mongodb#2502

3ebd934

)

Bump pyright from 1.1.403 to 1.1.404 (mongodb#2506)

cd4e5db

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Update coverage requirement from <=7.10.3,>=5 to >=5,<=7.10.5 (mongod…

9892e1b

…b#2507) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

DRIVERS-3218 Avoid clearing the connection pool when the server conne…

1179c5c

…ction rate limiter triggers

set to one byte

bc91967

Bump the actions group with 5 updates (mongodb#2505)

8c361be

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Steven Silvester <[email protected]>

PYTHON-5519 Clean up uv handling (mongodb#2510)

0d4c84e

ShaneHarvey reviewed Aug 26, 2025

View reviewed changes

blink1073 and others added 13 commits August 26, 2025 19:45

update approach

f51e8a5

Merge branch 'master' of github.com:mongodb/mongo-python-driver into …

c1fe2e3

…DRIVERS-3218

Revert "Merge branch 'master' of github.com:mongodb/mongo-python-driv…

7584d2d

…er into DRIVERS-3218" This reverts commit c1fe2e3, reversing changes made to f51e8a5.

undo topology changes

f1544aa

improve sleep translation

9d34e52

improve sleep translation

bb5ac35

PYTHON-5519 Clean up uv handling (mongodb#2510)

957a87d

(cherry picked from commit 0d4c84e)

add prose tests

9d0af17

debug

da0c0e5

fix and update tests

70b4113

fix logic

c974d36

only backoff if conn is closed

845f17a

use AutoReconnect

09fc66d

blink1073 added 7 commits September 5, 2025 06:15

fix test

fa5c151

fix test

0ab78e4

Revert "update handshake error tests"

07d0233

This reverts commit 532c1b8.

update maxConnecting test

771570d

fix maxconnecting test

6623261

fix handling of maxConnecting

f602d4c

update test

64aa0af

ShaneHarvey changed the title ~~DRIVERS-3218 Avoid clearing the connection pool when the server connection rate limiter triggers~~ PYTHON-5536 Avoid clearing the connection pool when the server connection rate limiter triggers Sep 9, 2025

ShaneHarvey marked this pull request as ready for review September 9, 2025 20:08

ShaneHarvey requested a review from a team as a code owner September 9, 2025 20:08

blink1073 added 2 commits September 10, 2025 08:36

fix test

7f6335e

undo lock file changes

6db793d

ShaneHarvey reviewed Sep 10, 2025

View reviewed changes

blink1073 and others added 6 commits September 10, 2025 14:33

PYTHON-5538 Clean up uv lock file handling (mongodb#2522)

8c2eb91

(cherry picked from commit 98e9f5e)

wip

a84a181

wip

c5ce8dd

update backoff criteria

679807e

update backoff criteria

7e9f19f

update backoff criteria

b0b5800

blink1073 added 2 commits September 11, 2025 09:20

handle the already closed case

a033c58

handle another edge case

ded90b0

NoahStapp reviewed Sep 11, 2025

View reviewed changes

ShaneHarvey reviewed Sep 19, 2025

View reviewed changes

blink1073 added 5 commits September 22, 2025 07:28

Merge branch 'backpressure' of github.com:mongodb/mongo-python-driver…

2cd3c18

… into DRIVERS-3218

address review

0b4b265

clean up logic

7548f7b

clean up logic

5e34bdc

do not try and label errors

f1294dc

PYTHON-5536 Avoid clearing the connection pool when the server connection rate limiter triggers #2509

Are you sure you want to change the base?

PYTHON-5536 Avoid clearing the connection pool when the server connection rate limiter triggers #2509

Uh oh!

Conversation

blink1073 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShaneHarvey Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShaneHarvey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blink1073 Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blink1073 commented Sep 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blink1073 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

blink1073 commented Aug 26, 2025 •

edited

Loading

ShaneHarvey Aug 26, 2025 •

edited

Loading

blink1073 Sep 10, 2025 •

edited

Loading

blink1073 commented Sep 22, 2025 •

edited

Loading